2022-04-04

Syllabus

  • Static word embeddings
    • Frequency based methods, word2vec, GloVe, fastText, evaluation of embeddings
  • Contextual word embeddings
    • ELMo, Transformers and attention, BERT, sentence embeddings, contrastive learning
  • Additional topics
    • Geometry of the embedding space, bias, sentiment, multilingual embeddings
  • Topological data analysis
    • Hyperbolic embeddings, singularities and topological polysemy

Motivation: Winograd schemas

  • The trophy doesn’t fit into the brown suitcase because it’s too large.
  • The trophy doesn’t fit into the brown suitcase because it’s too small.

Task: Co-reference resolution

Motivation: Winograd schemas

  • The city councilmen refused the demonstrators a permit because they feared violence.
  • The city councilmen refused the demonstrators a permit because they advocated violence.

Task: Co-reference resolution

  • easy for humans to solve
  • difficult for computers
    • solution relies on real-world knowledge and common sense reasoning

Motivation: Winograd schemas

  • I put the cake away in the refrigerator. It has a lot of butter in it.
  • I put the cake away in the refrigerator. It has a lot of leftovers in it.

Motivation: Garden-path sentences

  • The old man the boat.

  • The complex houses married and single soldiers and their families.
  • The horse raced past the barn fell.

Methods

Some of the word vectors from a 100 dimensional fastText embedding trained on a Wikipedia corpus; projected to 2 dimensions using t-SNE.

Applications of word embeddings

  • Word-sense induction (WSI) or word-sense discrimination: task is the identification of the senses/meanings of a word
  • Output: clustering of contexts of the target word, or a clustering of words related to the target word

Example:

  • target word “cold”
  • collection of sentences:
    • “I caught a cold.”
    • “The weather is cold.”
    • “The ice cream is cold.”

Output: ?

  • Word-sense disambiguation (WSD): relies on a predefined sense inventory, and the task is to solve the ambiguity in the context
  • Output: identifying which sense of a word is used in a sentence

Part-of-speech tagging

  • grammatical tagging: decide which part of speech (noun, verb, article, adjective, preposition, pronoun, adverb, conjunction, and interjection) a word in a text corpus belongs to

PoS might depend both on definition of the word and its context

  • in language a large portion of word-forms are ambiguous
  • example from Wikipedia:
    • “dogs” usually is a plural noun,
    • but can also be a verb as in the sentence “The sailor dogs the hatch.”
  • example where order matters:
    • “can of fish”
    • “we can fish”

Sub-categories for PoS tagging:

  • for nouns, the plural, possessive, and singular forms can be distinguished.
  • “case” (role as subject, object, etc.), grammatical gender, and so on
  • verbs are marked for tense, aspect, and other things

Other tagging tasks:

Text classification

  • Document classification: spam / not spam

  • Review classification: positive / negative

  • Sentiment: positive / neutral / negative

  • single-label classification / multi-label classification

Generative and Discriminative Models

  • Generative models:
    • learn undelying data distribution \(P(x, y) = P(x | y) \cdot P(y)\)
    • prediction: given an input \(x\), pick a class with the highest joint probability \(y = \mathop{\mathrm{argmax}}_{k} P(x | y = k) \cdot P(y = k)\)
      • maximum a posteriori (MAP) estimate
  • Discriminative models:
    • learn the boundaries between classes (i.e. learn how to use the features)
    • prediction: given an input \(x\), pick a class with the highest conditions probability \(y = \mathop{\mathrm{argmax}}_{k} P(y = k | y)\)
      • Maximum Likelihood Estimate (MLE) of parameters

TODO: How to do prediction

Bag of Words (BoW) assumption: word order does not matter

TODO

Static word embeddings

TODO

Frequency based methods

TODO

word2vec, GloVe, fastText

word2vec: (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013)

Contextual word embeddings

  • I’m going to the bank to withdraw some money.
  • We’re sitting on the river bank with some friends.

Recurrent methods: ELMo

Transformers

  • attention

TODO

Bidirectional Encoder Representations from Transformers (BERT)

Huggingface transformers

Sentence embeddings

  • Sentence-BERT
    • sentence-pair regression tasks like semantic textual similarity (STS)

Geometry of the embedding space

TODO

Bias

TODO

Sentiment

TODO

Multilingual embeddings

TODO

Topological data analysis

TODO

Hyperbolic embeddings

TODO

Singularities

TODO

  • the manifold hypothesis does not hold at all points of certain static word embeddings

(Jakubowski, Gasic, and Zibrowius 2020)

Thank you!

Organisation

  • Schedule: See Google sheet
  • Each week talks by students (1 or 2 speakers per session, 70 minutes in total)
    • there should be enough time for questions and a discussion
  • Guest lecture ?
  • The final grade is based on your presentation
  • Hand in your extended abstract (ideally .tex, .bib files and compiled .pdf; maximum 2 pages with references) via ILIAS

References

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” CoRR abs/1810.04805. http://arxiv.org/abs/1810.04805.

Jakubowski, Alexander, Milica Gasic, and Marcus Zibrowius. 2020. “Topology of Word Embeddings: Singularities Reflect Polysemy.” In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, 103–13. https://arxiv.org/abs/2011.09413.

Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing. MIT Press.

Mikolov, Tomás, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, edited by Yoshua Bengio and Yann LeCun. http://arxiv.org/abs/1301.3781.

Mikolov, Tomás, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” CoRR abs/1310.4546. http://arxiv.org/abs/1310.4546.